Add tag-based key discovery support in the aws_kms KeyManager plugin#7006
Add tag-based key discovery support in the aws_kms KeyManager plugin#7006amartinezfayo wants to merge 2 commits into
aws_kms KeyManager plugin#7006Conversation
Signed-off-by: Agustín Martínez Fayó <amartinezfayo@gmail.com>
9efe60e to
bc249b3
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in, tag-based key discovery mode to the aws_kms KeyManager plugin to avoid scanning all KMS keys/aliases in an account, reducing required permissions and improving performance for large KMS inventories. It introduces discovery and reclamation logic based on SPIRE-managed KMS resource tags, with automatic migration for legacy (alias-managed) keys and documentation updates.
Changes:
- Add
enable_tag_based_key_discoveryconfiguration and wire in AWS Resource Groups Tagging API (tag:GetResources) for key discovery. - Implement tag-based key liveness (
spire-last-update) refresh and stale-key disposal (two-week threshold), with transparent startup migration from alias-managed keys. - Update tests, fake clients, and documentation; add
resourcegroupstaggingapiAWS SDK dependency.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/server/plugin/keymanager/awskms/fetcher.go | Adds tag-based key discovery and migration logic using the Resource Groups Tagging API. |
| pkg/server/plugin/keymanager/awskms/client.go | Introduces a tagging API client interface and constructor. |
| pkg/server/plugin/keymanager/awskms/client_fake.go | Extends fakes to support Tagging API and KMS TagResource call recording for tests. |
| pkg/server/plugin/keymanager/awskms/awskms.go | Adds config flag, SPIRE tag schema/constants, tag-based background tasks (keepalive + disposal), and creation-time tag stamping. |
| pkg/server/plugin/keymanager/awskms/awskms_test.go | Adds coverage for tag-based configure/migration, creation-time tag stamping, keepalive tagging, and disposal via tags. |
| go.mod | Bumps aws-sdk-go-v2 and adds resourcegroupstaggingapi dependency. |
| go.sum | Updates dependency checksums accordingly. |
| doc/plugin_server_keymanager_aws_kms.md | Documents tag-based discovery behavior, permissions split, HA considerations, and config examples. |
| // Trigger a goroutine to get the details of the key | ||
| g.Go(func() error { | ||
| entry, err := kf.fetchKeyEntryDetailsFromArn(ctx, keyArn, spireKeyID) | ||
| if err != nil { |
There was a problem hiding this comment.
keyArn and spireKeyID are declared inside the loop body, so each goroutine captures its own copy (and the range var is per-iteration on Go 1.22+ anyway). No race here.
| // Wait for expected deletions to be processed | ||
| for range tt.expectDeleteCount { | ||
| _ = waitForSignal(t, deleteSignal) | ||
| } |
There was a problem hiding this comment.
Range-over-int is valid since Go 1.22 (we are on 1.26 in SPIRE).
| // Note: spire-last-update is intentionally omitted here. It is set exclusively | ||
| // by keepActiveKeys, which runs on a regular schedule. | ||
| func (p *Plugin) buildSPIRETags(serverID, trustDomain string) []types.Tag { |
| // Note: When enabled, the plugin requires permission to use the | ||
| // resourcegroupstaggingapi:GetResources API action. | ||
| EnableTagBasedKeyDiscovery bool `hcl:"enable_tag_based_key_discovery" json:"enable_tag_based_key_discovery"` |
Signed-off-by: Agustín Martínez Fayó <amartinezfayo@gmail.com>
| tagKeyServerTD = "spire-server-td" // Trust domain (no hashing needed - AWS allows dots and long values) | ||
| tagKeyServerID = "spire-server-id" // Server identifier |
There was a problem hiding this comment.
Double checking my assumptions: I see that AWS KMS has a limit on the tag key to be 128 chars and the value to be 256 chars. This should be ok for both of these since the trust domain should be a maximum of 256 chars and the server ID we require to be a UUIDv4, right?
| case useTagBasedDiscovery && len(newConfig.KeyTags) > 0: | ||
| userTags := buildKeyTags(newConfig.KeyTags) | ||
| // Build a fresh slice to avoid mutating the spireTags backing array. | ||
| p.keyTags = make([]types.Tag, 0, len(spireTags)+len(userTags)) | ||
| p.keyTags = append(p.keyTags, spireTags...) | ||
| p.keyTags = append(p.keyTags, userTags...) |
There was a problem hiding this comment.
Should we check that there are no duplicate tags between spireTags and userTags? I don't know if it matters to the API, but the users might want to know.
| Tags: []types.Tag{ | ||
| { | ||
| TagKey: aws.String(tagKeyActive), | ||
| TagValue: aws.String("false"), |
There was a problem hiding this comment.
I think it's possible for the key to be enqueued for deletion and for us to mark the key as inactive, but for the key to not actually be deleted, for example if the key was deleting some other key and then shuts down.
Do we actually need to do the ScheduleKeyDeletion call asynchronously?
The
aws_kmsKeyManager plugin currently discovers its keys by listing every KMS alias and callingDescribeKeyacross the account's keys. This is not the most efficient way to discover SPIRE-managed keys, and it becomes problematic in deployments with a large number of keys in AWS KMS, where it is slow and requires broadkms:ListKeysandkms:DescribeKeypermissions.This PR adds an opt-in tag-based key discovery mode that uses the AWS Resource Groups Tagging API (
tag:GetResources) to find only the keys managed by the plugin, identified by SPIRE-specific tags. This mirrors the label-based approach already used by thegcp_kmsplugin.Changes:
enable_tag_based_key_discoveryconfiguration option (boolean, defaultfalse).spire-server-td,spire-server-id,spire-active) and looked up byspire-key-id.spire-last-updateat creation and during migration, and the value is refreshed periodically (every 6h). Stale keys (no refresh for two weeks) are marked inactive and scheduled for deletion, mirroring the alias-based liveness and reclamation contract.tag:GetResourcespermission now produces an actionable error explaining that the permission is identity-based and cannot be granted through the KMS key policy, instead of a raw AWS error.Permissions:
kms:TagResourceis required when tag-based discovery is enabled (or when usingkey_tags).tag:GetResourcesis required when tag-based discovery is enabled. It belongs to the Resource Groups Tagging API and must be granted in an identity-based IAM policy.kms:ListKeysis only required for the legacy alias-based discovery path.Deprecation:
Alias-based key discovery will be deprecated in a future version and removed in a later one. The default remains alias-based for now, and a warning is logged when it is in use.
HA and operational note:
Within a trust domain,
enable_tag_based_key_discoveryshould be configured consistently across all servers. The two discovery modes use independent liveness signals (aliasLastUpdatedDatevs thespire-last-updatetag), so a partial rollback sustained beyond the two-week window can cause tag-based peers to reclaim a rolled-back server's still-in-use keys. This is documented in the plugin docs.Documentation:
Updated
doc/plugin_server_keymanager_aws_kms.mdwith the new option, the tag schema, the migration behavior, the permission split, the deprecation note, and the HA configuration-consistency guidance.Fixes #6269